Conversation
|
(I submitted without writing the description, writing it with edit ig.) |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #362 +/- ##
==========================================
- Coverage 82.64% 82.58% -0.07%
==========================================
Files 28 29 +1
Lines 3440 3479 +39
==========================================
+ Hits 2843 2873 +30
- Misses 427 433 +6
- Partials 170 173 +3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@jech you didn't like the current RTP interface, what do you think about this one? |
|
There are many things marked as deprecated in pion/rtp, it would be good to remove them during migration to v3. |
|
@sirzooro thank you for the reminder, and yes this won't be the last PR, we should also move some stuff around. |
|
|
||
| package rtp | ||
|
|
||
| // MediaFormat describes the bitstream or sample representation passed to a payload format. |
There was a problem hiding this comment.
Not sure how much is applicable here, but I think whenever possible the RFC should be referenced for naming and rational behind particular concepts. Even if its just a comment with the RFC # + section.
There was a problem hiding this comment.
This is mostly application logic but I'll try to reference all my reasons, and specs behind every interface.
|
I need to think it over, but at first sight this looks good. Two questions. Why do we need the PacketizerContext as an explicit parameter? The Packetizer is a stateful object, right, so why not make the context part of the Packetizer? Why do we need the Flush method? |
|
@jech yeah I agree it was the first iteration and I just added too much, going to remove context from the public input, the user only needs to configure de/packetizers once (e.g avc vs annexb for h264). |
|
***@***.**** commented on this pull request.
So every *depacketizer* would have to deal with the concepts of samples
and keyframes even if they don't make sense for that particular payload
format (probably just by ignoring the fields)?
As I understand it, a sample corresponds to a sequence of rtp packets
that have the same timestamp. This corresponds to a frame in Opus and
VP8, and a Picture in VP9. I forget the AV1 term.
A keyframe is just a sample at which you may start decoding a stream
without any knowledge beyond what is in the SDP. The concept maps to
every current codec (or else how would you join a conference in the
middle?), although not always in an obvious manger (think of the
parameter OBUs in AVC, for example).
Joe, shall we rename sample to frame? Too bad for VP9, but I think they
got the terminology wrong.
|
|
A keyframe is a frame with no dependencies to prior frames. https://medooze.medium.com/mastering-the-av1-svc-chains-a4b2a6a23925 is still a very good reference. RTP shouldn't know about this ideally. |
|
@fippo In av1, RTP needs to know if it's a keyframe to set the N bit correctly, this is a limitation in Pion/RTP AV1 impl is that we set the N bit for all seq-headers and not only for key frames, because I didn't want to parse again in the packetizer Lines 1447 to 1490 in 1edc72d While it seems like setting the N flag for all seq headers doesn't break any impls, e.g chrome doesn't even construct full / real frames, because decoders don't require them. https://chromium.googlesource.com/external/webrtc/+/HEAD/modules/rtp_rtcp/source/video_rtp_depacketizer_av1.cc#387 But this change will make us correct Pion AV1 and other codecs impls that needs to be aware about "key frame". A similar limitation is that we don't set the P flag at all in our hevc impl, and we should set it at the end of video seq, This redesign should make all of this possible without parsing again inside RTP (and pion/webrtc and in the user application sometimes e.g with the current design the user has to do avc to annex-b before passing it to pion!). |
In addition to what Joe said, applications need to be able to identify a keyframe. For example, when a client joins a conference, it must not try to decode the stream until it sees a keyframe. |
| } | ||
|
|
||
| // MediaSample is the media input passed to a payload-format packetizer. | ||
| type MediaSample struct { |
There was a problem hiding this comment.
There is no timestamp field here. It would would be needed to implement extensions like ones from RFC 6051, or deal with gaps in media stream.
Description
Note: that this is V3 because rtp/v2 was released by mistake a few years ago, This moves the library more towards a generic RTP library, this is backward compatible with v1 for users with custom packetizers/depacketizers, but we should drop this in v4.
Today the payloader interface is:
and the generic webrtc-centeric packetizer builds RTP packets afterward. In the current interface, the market bit is hardcoded as "true on the last emitted payload fragment" which was true for many webrtc rtp payloads.
However, RTP defines the market bit as profile-specific not last fragment, RTP-MIDI is a clean example of the limitation of this interface #361 This RFC says the marker bit depends on the media type, for mpeg-genric for example it's always one, and this can't be expressed by "last payload wins".
When it comes to the depacketizer, the API has related problems:
Unmarshal only receives the RTP payload, IsPartitionHead receives only payload bytes, and only IsPartitionTail gets the marker bit. The full RTP header, timestamp, sequence number, extensions, padding state, SSRC, and payload type are not available to the depacketizer methods.
Then, The samplebuilder then calls the depacketizer in a way that reinforces this limitation: it checks the tail with Marker and Payload, checks the head with only Payload, and appends the result of Unmarshal(payload) to the sample data.
This broke down for me when I was finishing av1 support because RTP can omit the length field, and had to make the interface stateful to deal with stuff like W !=0, and it wasn't not possible to honor the push / append model.
Many depacketizers, carry mutable depacketization state such as fuaBuffer in H264Packet, so a packet type is also acting as a stateful depacketizer.
this refactors the interface into
which let's us define interfaces for codecs that have different input payloads e.g accepting avc and annex-b h264 without having hacks or random options, Also being able to write avc or annex-b as output without multiple transformation.
Reference issue
Fixes #69 and fixes #70